i  ♦ 


? 


V 

00 

cv? 

oc 

o 

o 


THE  INTERPRETATIONS  AND  APPLICATIONS  OF  THE 
INDEX  OF  REDUNDANCY  AND  THE  REDUNDANCY 
TRANSFORMATIONS , 


DAVID  E.f TYLER 


/ 


TECHNICAL  REPORT  #161,  SERIES  2 
DEPARTMENT  OF  STATISITCS  / 
PRINCETON  UNIVERSITY 
.  ,  JANUARY,  198# 


'  r 


1-67$) 


<  j  ~-‘J  Research  supported  in  part  by  ONR  Grants  N0014-67L0151-0017 

'■'-"'l /Vi ;  \  ■  /  £  -  -  /  -  >  ;  •  . 

and  N00O14-75-C-/0453,  ARO  Grant'  DAHC-04-74-G0178  and  ERDA  Grant 


E-lll-2310.  The  author  wishes  to  thank  Lawrence  S.  Mayer  for 
initially  suggesting  the  topic. 


ABSTRACT 


The  index  of  redundancy  has  been  receiving  increasing  attention 
in  disciplines  which  employ  applied  multivariate  techniques,  par¬ 
ticularly  psychology  and  education.  This  index  purports  to  measure 
the  degree  to  which  one  random  vector  can  predict  another  random 
vector.  In  this  paper  attention  is  focused  on  the  present  appli¬ 
cations  and  interpretations  of  the  index  of  redundancy  and  to  the 
relationship  between  the  index  and  other  multivariate  techniques. 
Also,  simultaneous  transformations  of  the  two  random  vectors, 
which  differ  from  the  standard  canonical  transformations,  are 
derived  and  motivated.  These  simultaneous  transformations  are 
shown  to  be  naturally  related  to  the  index  of  redundancy. 


KEY  WORDS:  Canonical  correlation  and  variate  analysis ; 
of  Redundancy;  Total  variance. 
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1.  Introduction 

Steward  and  Love  (1968)  proposed  an  index  to  measure  the 
degree  to  which  one  random  vector  can  predict  another  random  vector, 
equivalently,  how  redundant  one  random  vector  is  relative  to  another 
random  vector.  Their  index  is  commonly  referred  to  as  the  "index 
of  redundancy",  and  is  becoming  popular  in  some  disciplines  which 
employ  applied  multivariate  techniques,  particularly  psychology 
and  education.  The  index  of  redundancy  is  included  in  the  applied 
multivariate  analysis  books  of  Cooley  and  Lohnes  (1971,  1976) , 

Timm  (1975)  and  Cohen  and  Cohen  (1975),  which  are  popular  in  these 
fields.  It  is  also  included  in  articles  which  review  multivariate 
techniques  in  these  areas,  such  as  Tatsuoka  (1973)  and  Darlington, 
Weinberg  and  Walberg  (1975).  More  recently,  the  index  of  redundancy 
has  been  introduced  to  the  disciplines  of  geography  [Briggs  and 
Leonard  (1977a)],  business  [Yoram  (1978)],  and  public  health 
[Laessig  and  Duckett  (1979)]. 

Since  its  introduction,  a  number  of  papers  have  been  written 
on  the  interpretation,  applications  and  properties  of  the  index 
of  redundancy  in  the  applied  literature.  These  papers  are  usually 
written  in  the  jargon  of  the  respective  fields,  and  this  has  led 
to  some  misconceptions  and  unresolved  debates  concerning  the 
applications  and  interpretations  of  the  index  of  redundancy. 

(See  Wood  (1972),  Nicewander  and  Wood  (1974,  1975),  Miller  (1975a), 
Gleason  (1977),  Cohen  and  Cohen  (1977),  and  Cramer  and  Nicewander 
(1979)  .) 


More  statisticians  are  likely  to  eventually  encounter  the 
index  of  redundancy,  and  so  this  paper  is  intended  to  be  partially 
expository.  In  this  paper,  attention  is  focused  on  the  present 
applications  of  the  index  of  redundancy,  and  to  the  relationship 
between  the  index  and  other  multivariate  techniques.  This  treat¬ 
ment  will  hopefully  help  clarify  some  of  the  issues  debated  in 
the  applied  literature.  Also,  in  the  appendix  of  this  paper,  a 
commonly  cited  "property"  of  the  index  of  redundancy  is  shown 
to  be  incorrect  by  means  of  a  counterexample. 

In  addition,  simultaneous  transformations  of  the  two  random 
vectors,  which  differ  from  the  standard  canonical  transformations, 
are  derived  and  motivated  in  Sections  4  and  5.  These  simultaneous 
transformations,  labeled  the  "redundancy  transformations",  are 
shown  to  be  naturally  related  to  the  index  of  redundancy.  The 
redundancy  transformations  are  suggested  for  use  when  analyzing 
the  relationship  between  two  random  vectors  in  studies  where  the 
index  of  redundancy  is  considered  a  valid  summary  index. 

2 .  Preliminaries 

Let  Y  be  a  p-dimensional  random  vector  and  let  X  be  a 
q-dimensional  random  vector,  which  without  loss  of  generality 
are  both  assumed  to  have  zero  mean.  Denote  the  joint  variance- 
covariance  matrix  of  Y  and  X  by 
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Unless  otherwise  stated,  t  is  assumed  to  be  nonsingular.  In 

z  /  X 

this  paper,  Y  is  regarded  as  the  dependent  vector  and  X  is  regarded 
as  the  independent  vector. 

I.  TOTAL  VARIANCE.  A  commonly  used  measure  of  the  overall 
dispersion  of  the  random  vector  Y  is  the  total  variance  of  Y, 
which  by  definition  is  Trace  (fy).  For  B,  an  arbitrary  (p * k) 
matrix  with  rank(B)  =  k,  the  variance  explained!  or  the  variance 
extracted  by  the  set  of  linear  combinations  B*Y  is  defined  to 
be  the  difference  between  the  total  variance  of  Y  and  the  total 
variance  of  the  residual  vector  Y  -  Yg,  where 

(2.2)  Yb  =  JYB(B*JyB)_1BfY 

is  the  linear  regression  of  Y  on  B’Y.  The  amount  of  the  total 
variance  of  Y  which  can  be  explained  by  the  set  of  linear  com¬ 
binations  is  therefore 

(2.3)  Ve(YiB'Y)  s  TraceC|yB(B*IYB)_1B'rY]. 

The  variance  extracted  by  uncorrelated  linear  combinations  of  Y 
are  additive.  That  is,  if  B  =  CB^  B^]  with  =  then 

(2.4)  Ve(Y-.B'Y)  =  Ve(Y:B|Y)  +Ve(YiB^Y). 


-3- 


In  particular,  if  b.  ,  b„,  b  are  a  set  of  non-null  vectors 

~  1  ~2  ~p 

such  that  b!Jl,b.  =  0  for  i  ^  j,  then 
~iY~j 

(2.5)  Trace (1^)  =  ^ (Y:b'Y) . 


When  the  concept  of  explained  variance  is  used  in  practice, 
the  random  vector  Y  is  often  scaled  so  that  each  component  of 
Y  has  equal  variance.  The  more  general  case,  though,  is  to  be 
used  in  this  paper.  That  is,  the  components  of  Y  are  not 
necessarily  assumed  to  have  equal  variances  - 


II.  CANONICAL  ANALYSIS.  The  most  developed  procedure  for 

analyzing  the  linear  relationship  between  two  random  vectors  is 

canonical  correlation  and  variable  analysis.  The  largest  canonical 

correlation  between  the  random  vectors  X  and  Y,  denoted  by  p^j  , 

is  the  maximum  absolute  correlation  between  a  linear  combination 

of  X,  say  a'^X,  and  a  linear  combination  of  Y,  say  b^^Y.  The 

variables  a  _ .  X  and  b  \ . .  Y  are  called  the  first  canonical  variables 

~(1)~  ~(1)~ 

for  the  X  and  Y  vectors  respectively.  The  second  canonical 

correlation  P^)  the  absolute  correlation  between  a 

linear  combination  of  X  uncorrelated  with  a .  X  and  a  linear 

~  ~ (1) ~ 

combination  of  Y  uncorrelated  with  bj^Y,  say  a’^jX  an<^  ~(2)~ 

respectively.  The  canonical  correlations  and  variables  p^j, 

a’,.,X,  and  i  =  3,  4,  ...,  min(p,q)  are  defined  analogously. 

~(i)~  ~  UJ  ~ 

If  q  >  p,  define  a^,  i  =  p+l,  p+2,  ...,  q,  to  be  any  vectors 
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such  that  is  uncorrelated  with  a ^  X  for  i  7*  j  ,  j  =  1,  2,  ....  q, 

and  i  =  p+l,  p+2,  .  ..,  q-  If  p  >  q,  define  b^j,  i  =  q+l,  q+2,  ....  p, 

in  an  analogous  manner.  For  completeness,  a^X,  i  =  p+l,  p+2 . q, 

or  b|^X,  i  =  q+l,  q+2,  ...,  p,  are  considered  canonical  variables 
associated  with  the  canonical  correlation  =  0,  p+1  <  i  i  q  or 

q+1 <  i <  p,  whichever  the  case. 

To  further  specify  the  canonical  variables,  it  is  conventional 

to  choose  a  , .  .  and  b  ,  . .  so  that  a  1 .  *  t„a  , . »  =  1 ,  i=l,  2,  ...,  q, 

~(Jj  ~(j)  "'ll)  X"U) 

and  b*(j)tYb(j)  =  1,  j  =  l,  2,  ...,  p. 


'  ~(q) 


/  •  «  •  / 


~(p)  - 


Let  A*  be  a  (q  *  q)  matrix  with  columns  a^j,  a^2)  , 
and  let  B*  be  a  (p  *  p)  matrix  with  columns  b^j 

When  the  transformations  Ai  and  B|  are  applied  simultaneously  to 
the  random  vectors  X  and  Y  respectively,  they  are  to  be  referred 
to  as  the  canonical  transformations .  The  transformed  vectors 
B^Y  and  A^X  have  the  much  simplified  joint  variance-covariance 
matrix. 


(2.6) 


b;ivb4 


B*  JyxA* 


where 


C  = 


LA:*'  0],  if  p  <  q 

P-1 

l  0  J  ,  if  p  >  q 


with  A  being  a  diagonal  matrix  of  order  min(p,q)  and  having  P^j 
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as  the  ith  diagonal  element. 

As  defined,  the  canonical  correlations  and  vectors  satisfy 
the  following  identities 

J'XYIYljYX~(i)  =  p(i)lX~(i>' 

{2.1)  J:YXJXljXY~(i)  =  P  (i)  ^Y~  (i)  '  and 

P  (i)  ~(i)  =  J'Xlj‘XY~(i)  ’ 

An  important  property  of  canonical  correlations  and  variables 
is  that  they  are  co-ordinate  free  concepts.  That  is,  they  are 
invariant  under  nonsingular  linear  transformations  of  Y  and  non¬ 
singular  linear  transformations  of  X. 

3.  The  Index  of  Redundancy 

Many  scalar-valued  indices  which  are  strictly  functions  of 
the  canonical  correlations  have  been  proposed  to  measure  the 
relationship  between  two  random  vectors.  In  a  recent  paper,  Cramer 
and  Nicewander  (1979)  discuss  many  such  indices. 

However,  the  co-ordinate  free  property  of  the  canonical 
correlations  is  not  always  a  desirable  property.  For  example, 
the  concept  of  total  variance  is  not  co-ordinate  free.  The 
ability  of  X  to  predict  a  linear  combination  of  Y  which  accounts 
for  a  large  proportion  of  the  total  variance  of  Y  may  be  of 
more  interest  than  the  ability  of  X  to  predict  a  linear  combination 
of  Y  which  accounts  for  a  small  proportion  of  the  total  variance 
of  Y.  This  distinction  cannot  be  considered  in  an  index  which  is 
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strictly  a  function  of  the  canonical  correlations. 

In  consideration  of  this  argument,  Stewart  and  Love  (1968) 
proposed  a  measure  which  they  called  an  "index  of  redundancy." 

This  index  is  a  weighted  average  of  the  squared  canonical 
correlations,  where  the  weights  are  the  proportions  of  total 
variance  of  Y  which  is  explained  by  each  of  the  canonical  variates 


DEFINITION  3.1.  The  index  of  redundancy  is  a  measure  of 
how  "redundant  Y  is  given  X, "  and  is  defined  to  be 

r2(Y:X)  =  ^f=1P(i)Ve(Y  :  bJi}Y) /Trace (Iy) . 

The  index  of  redundancy  is  an  asymmetric  index.  That  is, 
in  general  R2 (Y  :  X)  /  R2 (X  :  Y)  .  The  index  R2(Y  :  X)  distinguishes 
between  a  dependent  vector  (Y)  and  an  independent  vector  (X) .  If 
the  dependent  vector  is  univariate,  then  the  index  of  redundancy 
is  equivalent  to  the  square  of  the  multiple  correlation  coefficient. 

An  important  representation  of  the  index  of  redundancy  is 
given  in  the  following  lemma.  This  representation  is  discussed 
by  Stewart  and  Love  (1968)  without  justification.  A  proof  can  be 
found  in  Gleason  (1976) . 

LEMMA  3.2  R2 (Y  :  X)  =  Trace (tyxt£txy) /Trace (t) . 

This  lemma  states  that  the  index  of  redundancy  is  the  percent 
reduction  from  the  total  variance  of  Y  to  the  total  variance  of 
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1 . 


A 

the  residual  vector  Y  -  Y,  where 
(3.1)  X  =  IyxtxlX 

is  the  linear  regression  of  Y  on  X.  In  other  words,  the  index  is 
the  proportion  of  the  total  variance  of  Y  which  can  be  explained 
by  the  linear  regression  of  Y  on  X.  It  should  be  noted  that 
Rao  (1964)  informally  used  this  concept  to  measure  what  he  called 
the  '^predictive  efficiency" of  X  for  Y.  Consequently,  by  statement 
(8-6)  in  Rao  (1964),  we  note  that  the  index  of  redundancy  can  be 
decomposed  over  any  complete  set  of  uncorrelated  linear  combinations 
of  the  independent  vector.  That  is,  if  a^,  a^,  .  ..,  is  any 
set  of  non-zero  vectors  such  that  a ? I  a .  =  0  for  i f  j ,  then 

A~J 

(3-2)  R2(Y:X)  =  lJ=1R2(Y  :  aJX) . 

It  is  interesting  to  observe  that  statement  (3.2)  is  a  generalized 
version  of  the  summation  given  in  the  definition  of  the  index  of 
redundancy,  since 

(3.3)  R2(Y:a'  X)  =  p^.V  ( Y  :  b  f  Y)  /Trace  ( tv)  . 

~  ~vi)~  v i )  e  ~  ~i~  x 

By  using  the  representation  for  the  index  of  redundancy 
given  in  Lemma  3.2,  we  see  that  the  index  has  the  following 
important  property. 
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Lerrun  3.3.  If  P  is  orthogonal  and  A  nonsingular,  then 
R2 (Y  :  X)  =  R2 (PY  :  AX) . 

The  index  of  redundancy,  however,  is  not  invariant  under  arbitrary 
nonsingular  transformations  of  the  dependent  vector. 

REMARK.  In  the  definition  of  the  index  of  redundancy,  it 

is  assumed  that  the  joint  variance-covariance  matrix  of  the  vectors 

is  nonsingular.  This  is  a  consequence  of  the  use  of  canonical 

correlations  and  variables  in  the  definition.  In  view  of  the 

representation  of  the  index  given  by  Lemma  3.2,  Wieewander 

and  Wood  (1975)  note  that  the  index  of  redundancy  can  be  logically 

extended  in  the  following  manner.  If  rank(L)  >  1  and  if  l  is 

*  X 

nonsingular,  then  define 

(3.4)  R2(Y:X)  H  Trace  (?:YXS‘1IXY)/Trace  (Jy)  . 

If  rankdy)  >  1  and  rank(lx)  =  r  <  q,  then  define 

(3.5)  R2(Y  :  X)  =  R2(Y  :  B'X) , 

where  B  is  a  (t *  r)  matrix  such  that  B'l  B  is  nonsingular.  This 

X 

definition  does  not  depend  upon  the  choice  of  B.  These  extensions 
of  the  index  of  redundancy  also  represent  the  proportion  of  the 
total  variance  of  Y  which  can  be  explained  by  the  linear  regression 
of  Y  on  X. 
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4  -  The  Redundancy  Transformations 

In  practice,  the  index  of  redunduncy  is  usually  used  as  a 
summary  index  in  conjunction  with  canonical  correlation  and 
variable  analysis.  (For  example,  see  Stewart  (1967),  Briggs  and 
Leonard  (1977a,  1977b),  Oostendorp  and  Berlyne  (1978)  or  Cohen, 
Gaughran  and  Cohen  (1979).)  This  practice  is  suggested  not  only 
by  Stewart  and  Love,  but  also  in  the  review  paper  by  Tatsuota  (1973) 
and  in  the  applied  multivariate  analysis  books  by  Cooley  and 
Lohnes  (1971)  and  Timm  (1975).  In  addition,  the  index  of  redundancy 
is  included  in  a  recent  canonical  analysis  computer  program  by 
Thompson  and  Frankiewicz  (1979). 

It  is  argued,  though,  by  Nicewander  and  Wood  (1974,  1975) 
and  by  Cramer  and  Nicewander  (1979)  that  the  association  of  the 
index  of  redundancy  with  canonical  correlation  and  variable 
analysis  is  somewhat  aritficial.  Canonical  correlation  and 
variable  analysis  does  not  distinguish  between  a  depenuent  and 
an  independent  vector,  whereas  the  index  of  redundancy  does.  In 
addition,  the  index  of  redundancy  is  only  invariant  under  orthogonal 
transformations  of  the  dependent  vector,  whereas  the  canonical 
correlations  and  variables  are  invariant  under  all  nonsingular 
transformations  of  the  dependent  vector. 

In  view  of  this  argument,  simultaneous  transformations  of 
the  two  random  vectors  are  introducted  in  the  next  theorem  which 
would  be  more  appropriate  to  use  in  conjunction  with  the  index  of 
redundancy  than  the  standard  canonical  transformations.  Being 
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more  general,  these  transformations  should  prove  to  be  useful  in 
studies  where  a  distinction  is  made  between  the  dependent  vector 
and  the  independent  vector,  and  where  only  invariance  under 
orthogonal  transformations  of  the  dependent  vector  is  desirable. 

THEOREM  4.1  (The  Redundancy  Transformations) 


Let  *Y,X  *Y  *YX 
^XY  lX 


be  a  positive  definite  matrix  of 


order  (p  +  q) .  There  exists  an  orthogonal  matrix  Y  and  a  non¬ 
singular  matrix  X  such  that 


Y  0 


0  X 


,rxx 


Y  0]  Y’SyY  D 


0  X 


D’  I 


where 


[A  ?  0] ,  if  p<q 


[f] '  i£  p  ’ 9 


and  A  is  a  diagonal  matrix  of  order  min(p,q)  with  diagonal  entries 


A^ .  >  A^  .  > - s  A*1  .  r  0 . 

(1)  (2)  (min[p,q]) 


PROOF.  The  matrix  $yx£x  £Xy  is  Positive  semi-definite  of 
order  p.  Let  A  ^  ^  ~  ^  (2)  ~  '  '  '  -  *  (p)  be  its  oi9envalues,  and 
choose  Y  such  that  its  ith  column,  denoted  by  ,  is  an  eigen¬ 
vector  of  Syx^X^XY  associated  with  the  eigenvalue  A^  and 

chosen  such  that  yfy.  =  6.  where  <5.  .  represents  the  Kronecker 

~i~j  ij  ij 

delta.  By  construction,  Y  is  an  orthogonal  matrix.  Choose  X 

_p  _  ^ 

such  that  its  jth  column  is  x^  =  A ^ ^XYyj  ^or  J  =  1,  2,  *•*'  r 

where  r  =  rank(  Jyx^x^XY^  '  It  is  easy  to  verifY  that  £j  < 


j  =  1.  2, 


r  satsified  the  equation  ^XY^YX~j  =  ^(i)~j 


Thus,  we  have  x'Evx.=S..  for  i,j=l,  2,  ...,  r.  To  complete 

~J  A~J  ij 

the  definition  of  X,  if  q > r,  let  its  remaining  columns,  denoted 


by  x 


r+1 '  ~r+2 ' 


. ,  x  ,  be  the  solutions  to  the  equation  tv  IVvIwx  =  0 
~q  X  XY  YX~ 


such  that  x!j  x.  =  6..  for  i,j=r+l,  r+2, 

A~J  ij 


q-  So,  by  construction 


we  have  xjf^Xj  -  6^  for  i,j  =  1,  2,  ...,  q.  Finally,  for 

I'ihA  =  and  tot  i>r-  I'ihK^o- 

Thus,  the  proof  is  complete. 

The  X  and  Y  of  Theorem  4.1  are  not  unique.  The  next  theorem, 
however,  shows  that  any  X  and  Y  which  satisfies  Theorem  4.1  must 


be  of  the  form  constructed  in  the  proof. 


THEOREM  4.2.  If  Y  and  X  are  matrices  which  satisfy  Theorem 
4.1  with  y.  and  x.  the  ith  columns  of  Y  and  X  respectively,  then 

~i  ~i  —  — 

*YX*X  ^XY~i  ”  X(i)~i' 


*X  *XYlYXSi  =  XU)^i'  ^ 
X(i)~i  =  ^X^XyXi' 
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with  ~^(2)  2*'*  ~^(r)  bein9  tbe  non-zero  roots  of  ^yx^x^XY 

and  X.  =  A,  ,  =  ...  =  A.  ,  ,,  =0. 

(r+1)  (c+2)  (max[p,qj) 

PROOF.  Let  Y  and  X  be  matrices  satisfying  Theorem  4.1,  then 
— '^YX—  ~  ^  an<^  — '  Jx-  =  1 


(4.1) 


*YX  =  -D-  1  and  lX1  = 


Thus,  JYxtxlj:xY  =  (X')  ’  Y '  =  YDD '  Y  '  =  YA^'  where 


..,  A j  |  ) .  This  implies  that  y^ 
■  -1, 


A^  =  diagonal  (A  ,  A^)' 

and  i  =  1,  2,  ...,  p  satisfy  ^yx^x^XY^i  =  X(i)Yi‘  Likewise- 

tXllXYlYX  =  XX*  (X,)_1d,Y»YDX“1  =  XA^-1,  where  A2  =  diagonal  (A  x  , 

A (2) '  * ' " '  ^  (q) ^ '  imPlies  tbat  and  A (i)  ,  i  =  1,  2,  . . . ,  p 

■"U  '  .  ■  Finally ,  l”1  tXYy.  =  XX’  (X' ' )  _1D 'Yy^  « 


satisfy  t*  =  X(i)x. 


a1.,x..  Thus,  the  proof  is  complete. 
(i)~i 


The  transformations  Y*  and  X’  when  simultaneously  applied  to 

Y  and  X  respectively,  are  to  be  referred  to  as  the  redundancy 

transformations .  When  Y  and  X  are  thus  transformed.  Theorem  4.1 

gives  the  resulting  joint  variance-covariance  matrix.  In  addition, 

y'Y  and  xTX  are  to  be  referred  to  as  the  ith  redundancy  variables . 

v.  and  x.  as  the  ith  redundancy  vectors  and  A,.«  as  the  ith 
~1  ~-L - - (1)  - 

redundancy  root .  It  easily  follows  that  the  redundancy  roots  and 
variables  are  invariant  under  orthogonal  transformations  of  Y  and 
non-singular  transformations  of  X.  The  redundancy  vectors  are 
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equivalent  under  these  transformations. 

As  an  exploratory  technique,  the  redundancy  transformations 
do  not  simplify  the  joint  variance-covariance  matrix  to  the  extent 
in  which  the  canonical  transformations  do.  This  is  to  be  expected 
since  less  information  on  the  joint  variance-covariance  matrix  is 
sacrificed  when  only  considering  orthogonal  transformations  of 
Y.  In  particular,  the  index  of  redundancy  is  preserved.  That 
is,  by  applying  Lemma  3.3  with  P  =  Y'  and  A  =  X' ,  we  obtain 

(4.2)  R2 (Y  :  X)  =  R2(Y'Y-.X,X)  =  (p' q) A (i } /Trace ( Xy) . 

5.  The  Optimality  of  the  Redundancy  Transformations 

When  the  index  of  redundancy  is  used  in  conjunction  with 
canonical  correlation  and  variable  analysis,  the  value  of  (3.3) 
is  usually  used  to  help  determine  which  canonical  variables 
deserve  intepretation  and  further  attention  rather  than  simply 
using  the  canonical  correlations  themself.  This  approach,  for 
example,  is  applied  in  the  previously  mentioned  studies  of  Briggs 
Leonard  (1977a,  1977b),  Oostendorp  and  Berlyne  (1978),  Cohen, 
Goughran  and  Cohen  (1979),  and  Laessig  and  Duckett  (1979). 

Likewise,  this  approach  is  recommended  by  Stewart  and  Love  (1968), 
and  also  in  the  review  paper  by  Tatsuoka  (1973)  and  in  the  books 
by  Cooley  and  Lohnes  (1971)  and  Timm  (1975). 

This  practice  of  using  the  value  of  (3.3)  for  each  of  the 
canonical  variables  to  reduce,  in  essence,  the  dimensionality  of 
the  two  sets  of  multivariate  responses  is  not  an  optimal  procedure. 
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The  canonical  variables  are  extracted  because  they  best  explain 
the  intercorrelations  between  the  sets  of  responses.  They  are 
not  necessarily  the  best  linear  combinations  to  consider  when 
attempting  to  account  for  the  overall  size  of  the  index  of 
redundancy.  It  is  shown  in  this  section  that  the  redundancy 
variables  are  best  suited  for  this  purpose.  Before  doing  so,  it 
is  first  necessary  to  extent  the  concept  of  the  contribution 
made  by  a  canonical  variable  to  the  overall  size  of  the  index 
of  redundancy,  which  is  given  by  (3.3),  to  the  contribution 
made  by  any  set  of  linear  combinations  of  X  or  by  any  set  of 
linear  combinations  of  Y  to  the  overall  size  of  the  index. 

A  natural  extension  for  an  arbitrary  set  of  linear  com¬ 
binations  of  the  dependent  vector  is  the  proportion  of  the  total 
variance  of  the  dependent  vector  which  can  be  explained  by  its 

linear  regression  on  these  linear  combinations  only.  That  is, 

2 

the  value  of  R  (Y  :  A’X)  can  be  considered  as  the  contribution 

rv  <v 

made  by  the  set  of  linear  combinations  A’X  to  the  overall  size 
of  the  index  of  redundancy.  Thus  defined,  the  contributions 
to  the  index  made  by  uncorrelated  linear  combinations  of  the 
independent  vector  are  additive.  If  A  =  [A^  :  A£3  with  A^J^A^  =  0, 
then 

(5.1)  R2(Y:  A’X)  =  R2(Y  :  A.'X)  +R2(Y:AiX). 

In  particular,  if  Aq  is  a  (q  *  k)  matrix  with  rank(AQ)  =  k  and 
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whose  columns  are  a  subset  o£  the  canonical  vectors,  say  {a  i  e  I}, 

then  we  have  the  desired  result 

(5.2)  R2(Y  :  A^X)  =  EicIp2.)Ve(Y  :  b>(i)Y)/Trace(ty)  . 

For  an  arbitrary  set  of  linear  combinations  of  the  dependent 
vector,  a  suitable  extension  is  not  obvious.  One  extension  pro¬ 
posed  by  Miller  (1969)  and  Miller  and  Farr  (1971)  for  any  linear 
combination  b'Y  is  the  product  R2(b'Y  :  X)R^(Y  :  b'Y).  Their  work 
is  discussed  in  more  detail  in  the  appendix  of  this  paper.  In 
particular,  it  is  shown  in  the  appendix  that  it  is  possible  for 
this  product  to  be  greater  than  the  index  of  redundancy  itself. 

Thus,  an  alternative  generalization  is  needed. 

To  motivate  an  alternative  generalization,  note  that 

(5.3)  R2(Y:B’Y)  =  E  p2V(Y  :  b'.Y) /Trace  (!) , 

~  o  lei  (1/  e  ~  ~  (l  J  ~  y 

where  B  is  a  (p  *  k)  matrix  with  rank(B  )  =k  and  whose  columns 
o  o 

are  the  canonical  vectors  {b,.^,  ie  i}.  So,  in  general,  it  is 

A  ~  '  1  • 

2  A 

proposed  that  R  (Y  :  B'Y)  be  considered  as  the  contribution 
made  by  the  set  of  linear  combinations  B’Y  to  the  overall  size 
of  the  index  of  redundancy.  This  quantity  represents  the  pro¬ 
portion  of  the  total  variance  of  Y  which  can  be  accounted  for 
by  the  1  inear  regression  of  B'Yon  X.  As  defined,  the  contributions 
to  the  index  of  redundancy  made  by  uncorrelated  linear  combinations 
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of  the  dependent  vector  are  not  necessarily  additive.  The  con¬ 
tributions  made  by  the  linear  combinations  of  Y  "whose  linear 


regression  on  X  are  uncorrelated,  though,  are  additive.  That 
is,  if  B  =  [Bi  :  B2]  with  B]Jyx^'X1^XYB2  =  then 

(5.4)  R2(Y  :  B’Y)  =  R2 (Y  :  BjY)  +  R2 (Y  :  B^Y)  . 


In  particular,  if  b^,  b^, 

that  feiVx^xvSj  =  0  £or 


. . . ,  b  are  any  set  of  vectors  such 

~p 

i  /  j ,  then 


(5.5) 


R2  (Y  :  X)  =  EP 


i=l 


R2(Y 


A 


In  view  of  these  extensions  of  (3.3),  the  main  optimality 
property  of  the  redundancy  transformations  is  given  in  the  next 
theorem.  This  theorem  states  that  of  all  sets  of  k  pairs  of 
linear  combinations  of  X  and  Y,  the  redundancy  variables  associated 
with  the  k  largest  redundancy  roots  best  account  for  the  over¬ 
all  size  of  the  index  of  redundancy. 

THEOREM  5.1.  Let  x.  and  y.  be  defined  as  in  Theorem  4.2, 

~i 

let  X^  be  a  (q *  k)  matrix  with  columns  x^,  x^,  ...  x^,  and  let 
Y^  be  a  (p  *  k)  matrix  with  columns  y^,  y2>  ...»  y^  . 

(i)  For  any  (q  *  s)  matrix  A  with  rank(A)  <k, 

R2(Y  :  A ?X)  <  R2(Y  :  X£X)  . 

(ii)  For  any  (p  *  s)  matrix  B  with  rank(B)  sk, 

R2(Y  :  B'Y)  S  R2(Y  :  Y.'Y)  . 

M  _  1/>V  r 
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Before  proving  Theorem  5.1,  it  is  interesting  to  note  that 


(5.6) 


(Y  :  X-X)  = 


(Y  :  Y^Y)  - 


E*  .^.../Traced  ) 
i=l  (l)  y 


PROOF  OF  THEOREM  5.1.  Part  (i)  follows  from  the  results 
of  Rao  (1964)  section  8.  In  that  paper,  it  is  shown  that  the 
quantity  TraceC  JY  -  t^A*  (AJXA* ) -1A' tXY3  is  minimized  over  all 
A  of  order  (k*q)  with  rank(A)  =  k  by  choosing  A'=XX.  For  all 
such  A,  the  inequality  in  part  (i)  holds  since  r (Y  :  AX)  = 
TraceC IYXA' (AtxA' ) _1A' tXY3/Trace (tY) .  The  inequality  easily 
extends  to  any  A  of  order  (s  *  q)  with  rank (A)  £  k,  (see  the 
remark  at  the  end  of  section  3 . ) 

To  prove  part  (ii) ,  we  note  that  for  all  B  of  rank  less 

A 

than  or  equal  to  k,  r(Y  :  B'Y)  is  maximized  by  choosing  B  such 
that  B»£  L1  =  MX*,  where  M  has  full  rank.  This  follows  from 

Y  .X  X  — K 

part  (i)  .  If  k  s  r,  where  r  =  rank ( ^yx^x^XY^  *  then  bY  using 
the  representation  for  $XY  and  Ix*  given  in  (4.1),  we  have 

ikhxh1  =  Ikl0?1™’  -  °k*k  "here  Dk  =  ^agonal  u'u,  !  rf2)  i 

If  k  >  r,  part  (ii)  is  immediate,  since  (Y  :  Y*Y)  =  (Y  :  X) . 


*  (k)  } 


After  reducing  a  multivariate  response  to  a  smaller  set 
of  linear  combinations  of  the  response,  it  is  customary  in 
practice  to  consider  linear  transformations  of  the  reduced  set 
of  linear  combinations.  These  linear  transformations  are 
usually  made  to  facilitate  the  interpretation  of  the  reduced 
set.  So,  it  is  important  to  note  that  the  optimality  property 
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for  X£X  and  Y^.Y  given  in  Theorem  5.1  still  holds  if  either  is  trans¬ 
formed  by  a  nonsingular  linear  transformation.  However,  in 
view  of  the  discussion  in  section  4,  only  orthogonal  transformations 
of  Y^Y  would  be  appropriate. 

6 .  Concluding  Remarks 

The  redundancy  transformation  for  the  independent  vector  X 
was  first  introduced  by  Rao  (1964)  .  He  referred  to  this  trans¬ 
formation  as  the  principal  components  transformation  for  the 
instrumental  variable  X  with  respect  to  the  variable  Y.  This 
transformation  also  arises  in  reduced  rank  regression  problems, 

(see  Brillinger  (1975)  Theorem  10.21,  or  Izenman  (1976).)  The 
redundancy  transformation  for  the  dependent  vector  Y  is  the  prin- 

A 

cipal  components  transformation  for  Y. 

In  this  paper,  these  two  transformations  are  viewed  as 
being  naturally  related  to  each  other  and  to  the  index  of  redun¬ 
dancy.  In  exploring  the  relationship  between  two  multivariate 
responses,  it  should  prove  desirable  to  have  a  transformation 
for  one  of  the  responses  which  is  accompanied  by  a  suitable 
transformation  for  the  other  response. 

It  must  be  acknowledged  that  Van  den  Wallenberg  (1977) 
also  relates  the  index  of  redundancy  with  the  redundancy  trans¬ 
formation  for  the  vector  X.  He  does  not  refer  to  Rao 1 s  paper 
and  derives  it  independently.  In  Van  den  Wallenberg's  paper, 
it  is  suggested  that  Y  be  transformed  in  a  manner  similar  to 
the  transformation  for  X,  that  is,  to  use  the  eigenvectors  of 
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I  t  .  In  this  approach,  the  trans formation  for  Y  is  not 

Y  YX  XY  ~ 

related  to  the  transformation  for  X. 

APPENDIX.  A  count er example  to.  a  result  by.  Miller  and  Farr . 

In  defining  the  index  of  redundancy,  Stewart  and  Love 
refer  to  the  value  of  {3.3)  as  "the  proportion  of  variance 
of  the  Y  set  explained  by  the  correlation  between  a^X  and 
bt.,Y."  Stewart  and  Love  observe  that  this  quantity  is  the 
proportion  of  the  variance  of  the  Y  set  "extracted"  by  the 
canonical  variate  b'^jY  times  the  proportion  of  the  variance 
of  b!..Y  which  is  "predictable"  from  X. 

~  (i)  ~  ~ 

Recognizing  that  linear  combinations  other  than  the  canonical 
vectors  are  often  of  interest,  Miller  (1969)  and  Miller  and  Farr 
(1971)  proposed  a  generalization  of  the  above  concept.  To 
quote  them  using  the  notation  established  in  this  paper,  they 
suggest  that  for  any  linear  combination  of  Y,  the  product 

(A. 1 )  R2(Y  :  b'Y)R2(b’Y  :  X) 

car.  be  considered  as  "the  proportion  of  the  total  variance  in  Y 
explained  by  X  with  respect  to  the  component  b'Y."  This  quantity 
is  the  proportion  of  the  total  variance  of  Y  which  can  be  "ex¬ 
tracted"  by  the  variable  b'Y  times  the  proportion  of  the  variance 
of  b'Y  which  can  be  "explained"  by  X.  Miller  and  Farr  call 
this  concept  a  "multiplication  law.” 
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In  addition,  they  claim  that  if  b^,  b^,  ....  b^  are  a  set 

of  vectors  such  that  b!Lb,  =  0  for  i/  j,  then  "because  of  the 

~i  x~j 

orthogonality  of  the  components  b^Y, "  the  proportions  of  the 

total  variance  in  Y  explained  by  X  with  respect  to  each  of  the 

2 

components  can  be  added  to  obtain  R  (Y  :  X)  .  That  is. 


(A. 2) 


R^(Y  :  X)  =1?  . R  l Y  :  b!Y)R^(b!Y  :  X) 
~  ~  1  =  1  ~  ~i~  ~i~  ~ 


This  summation  is  equivalent  to  the  summation  defining  the  index 

of  redundancy  if  the  set  of  vectors  {b^}  are  chosen  to  be  the 

2  2  2 

canonical  vectors,  since  R  (b}..Y  :  X)  =  p,.,  and  R  (Y  :  bJ..Y)  = 

~(i)~  ~  v  i )  ~  ~  (i)  ~ 

V  (Y  :  b' . . Y) /Trace  ( $  ) - 
e  ~  Y 

In  both  Miller  (1969)  and  Miller  and  Farr  (1971),  statement 

(A. 2)  is  justified  by  informal  arguments.  They  accompany  their 

arguments  by  an  example  using  the  principal  component  vectors. 

Statement  (A. 2),  however,  is  incorrect.  Also,  to  view 
2  2 

R  (Y  :  b'Y)R  (b'Y  :  X)  as  the  proportion  of  the  total  variance  in  Y 
explained  by  X  with  respect  to  the  components  b'Y  is  misleading. 
The  following  counterexample  to  (A. 2)  shows  that  it  is  possible  to 
have  R2(Y  :  b'Y)R2(b'Y  :  X)  >  R2 (Y  :  X)  . 

COUNTEREXAMPLE.  Let  the  joint  variance-covariance  matrix 
of  Y2x1  and  Xixl  be 
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1 


"Y,X 


1.0 

0.1 

0.1 

0.1 

1.0 

0.0 

0.1 

0.0 

1.0 

This  gives  R2 (Y  :  X)  =  .005.  Also,  if  b'  =  (1.0  0.0),  then 
r2 ( Y  :  b'Y)  =  .505  and  R2(b'Y  :  X)  =  .01.  Thus  R2 (Y  :  b’Y)R2(b*Y  :  X)  = 
.00505.  Statement  (A. 2)  is  then  countradicted  by  choosing  bj =  (1.0  0.0) 
and  b'  =  (-0.1  1.0) . 


It  is  interesting  to  note  that  if  the  vectors  {b^}  in 

statement  (A. 2)  are  chosen  to  be  the  principal  component  vectors, 

then  statement  (A. 2)  is  valid.  To  show  this,  observe  that  if 

2  -1 

B  is  an  orthogonal  matrix,  then  R  (Y  :  X)  =  Trace (B'£  J  IB) /Trace (J) , 

^  ~  lA  A  AY  I 

or  equivalently 


(A. 3)  R2 ( Y  :  X)  =  I?  . R2 (b ! Y  :  X) var (b ! Y) /Trace ( £v) , 

where  b^  represents  the  ith  column  of  B  and  "var"  designates 

variance.  If  b^  is  a  principal  component  vector,  then  it  is 

well  known  that  var(b|Y)  =  Vg(Y  :  b^Y) ,  and  so  var (b^Y) /Trace (1^)  = 

2 

R  (Y  :  b|Y) .  Therefore,  statement  (A. 2)  and  (A. 3)  are  equivalent 
when  the  set  of  vectors  {b^}  is  taken  to  be  the  principal  com¬ 
ponent  vectors.  This  discussion  explains  why  the  example  given 
by  Miller  and  Farr  using  the  principal  component  variables 
works  correctly.  The  summation  over  the  principal  component 
variables  and  the  summation  over  the  canonical  variables  how¬ 
ever  are  not  two  special  cases  of  the  more  general  statement  (A. 2)  . 
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Miller  and  Farr's  results  are  often  cited  in  papers  per¬ 
taining  to  the  index  of  redundancy,  such  as  Tatsuoka  (1973), 
Darlington,  Weinberg  and  Walberg  (1975),  Dawson  (1976),  Briggs 
and  Leonard  (1977a,  1977b)  and  Cramer  and  Nicewander  (1979). 
They  are  also  cited  in  the  multivariate  analysis  book  by 
Cooley  and  Lohnes  (1971),  and  by  one  of  the  authors.  Miller 
(1975a,  1975b).  However,  no  application  of  statement  (A. 2) 
has  appeared  in  practice  for  which  the  set  of  vectors  {b^} 
are  not  the  canonical  vectors  or  the  principal  component  vec¬ 
tors,  even  though  Miller  and  Farr  recommend  its  use  in  general. 
Apparently,  this  accounts  for  the  previously  undetected  error 
in  statement  (A. 2). 
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